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Abstract — Distributed medium access control (MAC) protocols 
are essential for the proliferation of low cost, decentralized 
wireless local area networks (WLANs). Most MAC protocols are 
designed with the presumption that nodes comply with prescribed 
rules. However, selfish nodes have natural motives to manipulate 
protocols in order to improve their own performance. This often 
degrades the performance of other nodes as well as that of the 
overall system. In this work, we propose a class of protocols 
that limit the performance gain which nodes can obtain through 
selfish manipulation while incurring only a small efficiency loss. 
The proposed protocols are based on the idea of a review strategy, 
with which nodes collect signals about the actions of other nodes 
over a period of time, use a statistical test to infer whether or not 
other nodes are following the prescribed protocol, and trigger a 
punishment if a departure from the protocol is perceived. We 
consider the cases of private and public signals and provide 
analytical and numerical results to demonstrate the properties 
of the proposed protocols. 

Index Terms — Deviation-proof protocols, game theory, MAC 
protocols, repeated games. 



I. Introduction 

IN WIRELESS communication networks, multiple nodes 
often share a common channel and contend for access. To 
resolve contention among nodes, many different MAC proto- 
cols have been devised and are currently used in international 
standards (e.g., IEEE 802.1 la/b/g protocols) [If. When a MAC 
protocol is designed, two types of node behavior can be as- 
sumed. One is cooperative nodes that comply with prescribed 
protocols, and the other is selfish nodes that are capable of 
manipulating prescribed protocols in order to improve their 
own performance. With cooperative nodes, a MAC protocol 
can be designed to optimize the system performance 12l-||6|. 
However, such a protocol is not robust to selfish manipulation 
in that selfish nodes that can re-configure the software or 
firmware may want to deviate from the protocol in pursuit of 
their self-interest |7|. Thus, selfish manipulation often results 
in a suboptimal outcome, different from the one desired by 
the protocol designer lH-IlOi. On the other hand, a MAC 
protocol can be designed assuming selfish nodes so that the 
protocol is deviation-proof in the sense that selfish nodes do 
not find it profitable to deviate from the protocol. However, 
the incentive constraints imposed by the presence of selfish 
nodes in general restrict the system performance ||5l, 161. In 
this paper, we aim to resolve the tension between the selfish 
manipulation and optimal performance by proposing a class of 
slotted MAC protocols that limit the performance gain from 
selfish manipulation while incurring only a small efficiency 
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loss compared to the optimal performance achievable with 
cooperative nodes. 

Recently, a variety of slotted MAC protocols have been 
designed and analyzed using a game theoretic framework. 
With cooperative nodes, protocols can be designed to achieve 
system-wide optimal outcomes. In |3|, a class of slotted MAC 
protocols is proposed in which nodes can self-coordinate their 
transmission slots based on their past transmission actions 
and feedback information to achieve a time division mul- 
tiple access (TDMA) outcome. In ID, the authors propose 
generalized slotted Aloha protocols that maximize system 
throughput given a short-term fairness constraint. In 10, Q, 
variations of slotted MAC protocols with different capture 
effects, prioritization, and power diversity are studied. It has 
been demonstrated that with cooperative nodes one can obtain 
optimal throughput and expected delay as well as system 
stability. 

Selfish behavior in MAC protocols has also been analyzed 
using game theory. In ifTTl . the authors establish the stability 
region for a slotted Aloha system with multipacket reception 
and selfish nodes. In Il2l, the authors study the existence of 
and convergence to Nash equilibrium in a slotted Aloha system 
where selfish nodes have quality-of-service requirements. It is 
often observed that selfish behavior often leads to suboptimal 
outcomes. For example, a prisoners' dilemma phenomenon 
arises among selfish nodes using the generalized slotted Aloha 
protocols of |4|. A decrease in system throughput, especially 
when the workload increases due to the selfish behavior 
of nodes, is observed in 0, Q. In the 802.11 distributed 
MAC protocol, competition among selfish nodes results in an 
inefficient use of the shared channel in Nash equilibria 1 8 1 . 

Research efforts have been made to devise MAC protocols 
that sustain optimal outcomes among selfish nodes. In ifTSll . 
the authors induce selfish nodes to behave cooperatively in a 
slotted random access network by introducing an intervening 
node that monitors the actions of nodes and decides its 
intervention level accordingly. Pricing has also been used as a 
method to incentivize selfish nodes. In |5 |, the authors avoid 
the degradation of system throughput due to selfish behavior 
by adding a cost of transmissions and retransmissions. In lfT4l . 
the network charges nodes for each successfully transmitted 
packet, and the authors consider the problem of adjusting 
the price-per-packet to achieve a desired operating point. The 
above approaches, however, require a central entity, which may 
not be available in a distributed environment. In the case of an 
intervention mechanism, an intervening node that is capable of 
monitoring and intervening should be present in the system. In 
the case of a pricing mechanism, a billing authority is needed 
to charge payments depending on the usage of the network. 
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In this paper, we propose an approach that decentrahzes into 
nodes the burden of monitoring and punishing. 

To this end, we rely on the theory of repeated games 
ifTSl to sustain cooperation among selfish nodes. When the 
nodes in a system interact repeatedly, they can make their 
decisions dependent on their past observations. Thus, nodes 
can trigger a punishment when they observe a deviation from 
a predetermined operating point. If the loss due to punishment 
outweighs the gain from deviation, selfish nodes do not have 
an incentive to deviate from a predetermined operating point. 
The idea of using a repeated game strategy to build a deviation- 
proof protocol has recently been applied to several problems in 
communications and networking (see, for example, lfT6l - lfT9l ). 
However, most existing work assumes perfect monitoring, 
where players observe decisions that other players make. With 
perfect monitoring, it is relatively easy to construct a deviation- 
proof protocol by using a trigger strategy, which is commonly 
used to prove various versions of the Folk theorem. 

In our work, we consider a scenario where the decisions 
of nodes are their transmission probabilities, which cannot 
be observed directly. In order to design a deviation-proof 
protocol, we use the idea of a review strategy 1201 , Ell with 
which nodes collect imperfect signals about the decisions of 
other nodes, perform a statistical test to determine whether 
or not a deviation has occurred, and trigger a punishment if 
they conclude so. Our main contributions in this paper can be 
summarized as follows. 

• We model a slotted multiple access communications 
scenario as a repeated game, which allows us to adopt 
a repeated game strategy, including a review strategy, to 
design a protocol. 

• We first consider the case where nodes observe private 
signals on the channel access outcomes. We design 
deviation-proof protocols assuming that a deviating node 
can employ only a deviation strategy using a constant 
transmission probability. We provide a necessary and 
sufficient condition for a given protocol to be deviation- 
proof. We show that the efficiency loss of a deviation- 
proof protocol can be made arbitrarily small if there is 
a statistical test that becomes perfect as more signals are 
accumulated. 

• We also consider the case where nodes observe public 
signals on the channel access outcomes. We show that 
with public signals it is possible to design near-optimal 
deviation-proof protocols even when nodes can use any 
deviation strategy. 

• Besides slotted MAC protocols, we provide a possible 
application of our design methodology to the case of 
CSMA/CA protocols with selfish nodes. 

• We illustrate the properties of the proposed protocols with 
numerical results. 

The proposed protocols are fully distributed in the sense that 
they need no central entity to coordinate the operation of nodes 
and that nodes take actions depending solely on their own local 
information without communicating with other nodes. 

The rest of this paper is organized as follows. In Section 
II, we formulate a repeated game model for slotted multiple 
access communications. In Section III, we propose and analyze 



deviation-proof protocols based on a review strategy when 
signals are private, with an example presented in Section IV. 
In Section V, we investigate deviation-proof protocols when 
signals are public, with an example presented in Section VI. 
In Section VII, we discuss a possible extension of the proposed 
protocols to a CSMA/CA network with selfish nodes. We 
conclude the paper in Section VIII. 

II. Repeated Game Framework for Slotted 
Multiple Access Communications 

A. Stage Game 

We consider a wireless communication network with a set 
TV = {1,2,..., N} of N nodes interacting over time. Time is 
divided into slots of equal length, and in each slot, a node has 
a packet to transmit (i.e., saturated arrivals) and can attempt 
to send the packet or wait. Due to interference in the shared 
communication channel, a packet is transmitted successfully 
only if there is no other packet transmitted in the same slot. 
If more than one transmission takes place in a slot, a collision 
occurs and no packet is transmitted successfully. We model 
the interaction of nodes in a single slot as a non-cooperative 
game in normal form, called the random access game. 

The set of pure actions available to node i E J\f in a slot 
is Ai = {T, W}, where T stands for "transmit" and W for 
"wait." We denote the pure action of node i by G Ai and a 
pure action profile by a = {ai, . . . ^a^) e A = Y[-^j^ Ai. 
A mixed action for node i is a probability distribution on 
Ai. Since there are only two pure actions, a mixed action 
for node i can be represented by a transmission probability 
Pi G [0, 1], and the set of mixed actions for node i can be 
written as Pi = [0, 1]. A mixed action profile is denoted by 
p = {pi,. . . ,Pn) G V ^ Yli^f^Pi- The payoff function of 
node i is defined hy Ui : A M., where Uj(a) = 1 if Oj = T 
and Qj — W for all j ^ i and Ui (a) otherwise. That is, a 
node receives payoff 1 if it has a successful transmission and 
otherwise. Then, the expected payoff of a node is given by 
the probability that it has a successful transmission, and with 
a slight abuse of notation, the payoff of node i when mixed 
action profile p is chosen can be written as 

Ui{p)=pt Yl (I^-Pj)- 

The random access game is defined by the tuple F = 
{Af, {Ai)i^jij-, (ui)ii^js/). It is well-known from the static anal- 
ysis of the random access game that there is at least one node i 
choosing pi = 1 at any pure strategy Nash equilibrium (NE) 
191 , llT?i . That is, when nodes myopically maximize their own 
payoffs, there is at least one node always transmitting its pack- 
ets, and thus there can be at most one node obtaining a positive 
payoff. Moreover, in the unique symmetric NE, every node 
transmits with probability 1, which results in zero payoff for 
every node. On the other hand, the symmetric Pare to optimal 
(PO) outcome is achieved when each node chooses Pc = 1 /N , 
which yields a positive payoff v^^ = (1 — \/N)^~^ /N for 
every node ll22ll . We call pc the cooperation probability and 
w^*-* the optimal payoff. 
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B. Repeated Game 

We now formulate the repeated random access game, where 
the actions of a node can depend on its past observations, or 
information histories. Time slots are indexed by t = 1,2,.... 
At the end of each slot, nodes obtain signals on the pure action 
profile chosen in the slot. Let Zi be the finite set of signals 
that node i can receive. Define Z = TlieA^ Q 
be a mapping from A to A(^), where (5(a) represents the 
distribution of signals when nodes choose pure action profile 
a. A signal structure is specified by the pair {Z,Q). We say 
that signals are private if there exist z = (zi, . . . , zjv) £ Z 
and a e ^ such that Zi ^ Zj for some i,j G M and z 
occurs with positive probability in Q{a). We say that signals 
are public if they are not private. That is, signals are private 
if it is possible for nodes to receive different signals, whereas 
signals are public if signal realization is the same for all nodes. 

The history of node i in slot t, denoted by /i*, contains the 
signals that node i has received by the end of slot t — 1. That 
is, hj = {zf, . . . , zl~^), for t = 1, 2, . . ., where z* represents 
the signal that node i receives in slot t and z° is set as an 
arbitrary element of Z^Q The set of slot t histories of node 
i is written as Hf, and the set of all possible histories of 
node i is given by Hi = U^j^iJ*. The (behavior) strategy of 
node i specifies a mixed action for node i in the stage game 
conditional on a history it reaches. Thus, it can be represented 
by a mapping ai : Hi Pi. We use to denote the set of 
strategies of node i. We define a protocol as a strategy profile 
CT = (di, . . . , (Tat) ^ T, ^ riieA/'^i- To evaluate payoffs in 
the repeated game model, we use the limit of means criterion 
since the length of a slot is typically short0, A protocol a 
induces a probability distribution on the sequences of mixed 
action profiles {p*}J^i, where p* is the mixed action profile in 
slot t. The payoff of node i under protocol a can be expressed 



U,{a) = lim E 



assuming that the limit exists. If the limit does not exist, we 
replace the operator lim by lim inf 

We say that a signal structure (Z, Q) is symmetric if 
Zi = ■ • • = Zpf and the signal distribution Q is preserved 
under permutations of indices for nodes. With a symmetric 
signal structure, we have Hi = ■ ■ ■ = Hm and thus Si = 
• • ■ = Sjv since Pi = ■ ■ ■ — Pm- We say that a protocol a is 
symmetric if it prescribes the same strategy to every node, i.e., 
(Ti = • ■ = (Tat. In the remainder of this paper, we assume 
that the signal structure is symmetric and focus on symmetric 
protocols. Since a symmetric protocol can be represented with 

'in slot t > 2, node i also knows its past mixed actions (p^, . . . ,p*~^) 
and their realizations (aj, . . . , a*~^). However, since we focus on repeated 
game strategies using only past signals, we do not include them in our history 
specification. 

^For example, the slot duration of the 802.11 DCF basic access method is 
20Ats 111. 

^Although we consider the limit of means criterion, the following results 
can be extended with a complication to the case of the discounting criterion 
as long as the discount factor is close to 1, as in |20|. Our analysis can also 
be extended to the case where a node incurs a transmission cost whenever it 
attempts transmission, as long as the cost is small. 



a Strategy, we use the two terms "protocol" and "strategy" 
interchangeably. Also, we use U{a^; ct^) to denote the payoff 
of a node when it follows strategy while every other node 
follows strategy a^. Note that a symmetric protocol yields 
the same payoff to every node, thus achieving fairness among 
nodes. 

C. Deviation- Proof Protocols and the Efficiency Loss 

The goal of this paper is to build a protocol that fulfills 
the following two requirements: (i) selfish nodes do not gain 
from manipulating the protocol, and (ii) the protocol achieves 
an optimal outcome. We formalize the first requirement using 
the concept of deviation-proofness while evaluating the second 
requirement using the concept of efficiency loss. 

Definition 1: A protocol cr G is deviation-proof (DP) 
against a strategy cr' G E^ if 

U{a](j) > U{a';a). 

When a is DP against cr', a node cannot gain by deviating to 
a' while other nodes follow a. Hence, if a deviating node has 
only one possible deviation strategy a', a protocol a that is DP 
against cr' satisfies the first requirement. However, in principle, 
a deviating node can choose any strategy in Ej, in which case 
we need a stronger concept than deviation-proofness. 

Let Ec C Ei be the set of all constant strategies that pre- 
scribe a fixed transmission probability pd, called the deviation 
probability, in every slot regardless of the history. 

Definition 2: A protocol cr G E^ is robust e-deviation-proof 
(robust e-DP) if 

C/(cr; ct) + e > C/(cr'; cr) for all cr' G E^. 

In words, if a protocol cr is robust e-DP, a node cannot gain 
more than e by deviating to a constant strategy using a fixed 
deviation probability. If there is a fixed cost of manipulating 
a given protocol and a deviation strategy is constrained to 
constant strategies, then a robust e-DP protocol can prevent 
a deviation by having e smaller than the cost. When there is 
no restriction on possible deviation strategies, the following 
concept is relevant. 

Definition 3: A protocol cr G E,; is e-Nash equilibrium (e- 
NE) if 

[/(cr; cr) + e > C/(cr'; cr) for all cr' G E,. 

We define the system payoff as the sum of the payoffs of 
all the nodes in the system. Then, the system payoff when 
all nodes follow a protocol a is given by V{a) = NU{a-(j). 
Since Nu^^ is the maximum system payoff in the stage game 
achievable with a symmetric action profile, we measure the 
efficiency loss from using a protocol by the following concept. 

Definition 4: The efficiency loss of a protocol cr G E^ is 
defined as 

(1) 



if 



C{a) = A^ii^" - V{a) 



Definition 5: A protocol ct G E^ is 6-Pareto optimal (J-PO) 

C{(j) < 5. 
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TABLE I 

Main Results 



Section 


Signal 


Test 


Robustness to selfish manipulation 


Optimality 


III (Proposition 2) 


Private (general) 


Asymptotically perfect test 


DP against a strategy using 
a constant transmission probability 


(5-PO 


IV (Theorem 2) 


Private (ACK feedback) 


ACK ratio test 


robust e-DP 


5-PO 


V (Proposition 5) 


Public (general) 


Asymptotically perfect test 


DP against a strategy using a constant 
transmission probability in a review phase 


(5-PO 


VI (Tlieorem 4) 


Public (ternary feedback) 


Idle slot ratio test 


e-NE 


5-PO 



A (5-PO protocol is a protocol that yields an efficiency loss 
less than or equal to S. Let cr^ be the strategy that prescribes 
the cooperation probability pc in every slot regardless of 
the historyO Then U{a'^;a'^) = and thus cr'^ achieves 

full efficiency (i.e., 0-PO). However, cr'^ is not DP against a 
constant deviation strategy with pd > pc as a deviating node 
can increase its payoff from pc{l —Pc)^^^ to Pd{^ ~Pc)^^^- 
We construct DP protocols that achieve a near-optimal system 
payoff in the following sections, whose main results are 
summarized in Table I] 

in. Deviation-Proof Protocols When Signals are 
Private 

A. Description of Protocols with Private Signals 

In this section, we consider private signals. As pointed 
out in II23I . when signals are private, it is difficult, if not 
impossible, to construct a NE that has a simple structure and 
is easy to compute. Thus, we focus on a simpler problem 
of constructing a DP protocol against a constant deviation 
strategy a'^ G Ec. Since a simple protocol such as cr'^ is 
DP against cr'^ with pd G [0,Pc], we restrict our attention to 
deviation strategies with pd S {pc, !]■ Note that the restriction 
to constant deviation strategies is relevant when a deviating 
node has a limited deviation capability in the sense that it can 
reset its transmission probability only at the beginning. 

We build a protocol based on a review strategy. When a node 
uses a review strategy, it starts from a review phase for which 
it transmits with probability pc and collects signals. When the 
review phase ends, the node performs a statistical test whose 
null hypothesis is that every node transmitted with probability 
Pc during the review phase, using the collected signals. Then, 
the node moves to a reciprocation phase for which it transmits 
with probability pc (cooperation phase) if the test is passed 
and with probability 1 (punishment phase) if the test fails. 
When the reciprocation phase ends, a new review phase begins. 
A review strategy, denoted by cr'', can be characterized by 
three elements, {R,L,M), where i? is a statistical test, and 
L and M are natural numbers that represent the lengths of a 
review phase and a reciprocation phase, respectively. Thus, 
we sometimes write cr'' as a'^ {R, L, M). With a protocol 
based on review strategy {R, L, M), each node performs 
the statistical test R after slot 1{L + M) + L based on the 
signals (z'''^^*^''^^, . . . , z'*'^^*^-'^^) collected in the recent 
review phase, for I — 0, 1, . . .. A schematic representation of 
a review strategy with private signals is provided in Fig. [T] 

*Note that cr'^ corresponds to a slotted Aloha protocol that does not 
distinguish new and backlogged packets as in 1 121 . 



Review phase {L slots) 

(cooperate and 

collect signals) 
V , / 



Statistical test 





^^j)ass 






f '\ 


Punishment 




Cooperation 


phase 




phase 


(M slots) 




(Mslots) 


V ) 




^ J 



Fig. 1. Review strategy with private signals 

The review strategies in li20J differ from the review strate- 
gies described above in that in ll20l a new review phase begins 
without having a reciprocation phase if the test is passed. A 
key difference between the model of |,20J and ours is that in 
the principal-agent model of ll20l only the principal reviews 
the performance of the agent whereas in our model multiple 
nodes simultaneously review the performance of other nodes. 
When signals are private, nodes do not know the results of 
the test performed by other nodes. Hence, without a recipro- 
cation phase followed by a successful review, nodes cannot 
distinguish a deviating node from a punishing node and thus 
cannot coordinate to begin a new review phase. This problem 
can be avoided when signals are public, because the results 
of the test are the same across nodes in the case of public 
sig nalsH Thus, a review strategy is modified accordingly in 
Section V, where we consider public signals. 

B. Analysis of Protocols with Private Signals 

1) Existence of Deviation-Proof Protocols: For the sake of 
analysis, we consider a fixed constant deviation strategy a'^ G 
Sc and the corresponding deviation probability pd G {pc, !]■ 
Given a symmetric protocol that prescribes a review strategy, 
we can compute two probabilities of errors. 

> False punishment probability Pf{R,L): probability that 
there is at least one node whose test fails after a review 
phase when nodes follow a protocol . 

m Miss detection probability P„i{R, L;pd)'- probability that 
there is no node among those following whose test 
fails after a review phase when there is exactly one node 
deviating to ct'^. 

'Alternatively, this problem can be avoided by having a node that has a 
failed test broadcast that it moves to a punishment phase, as in (21 1. However, 
this requires communication among nodes, which we do not allow in this 
paper. 
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Since the payoff of every node is zero when there are two 
or more punishing nodes, we need to have a small false 
punishment probability to achieve a small efficiency loss. On 
the other hand, in order to punish a deviating node effectively, 
we need to have a small miss detection probability. Indeed, as 
will be shown in Proposition 2, achieving small Pf and 
is sufficient to design a near-optimal DP protocol. 

The payoff of a node when every node follows a review 
strategy cr^ is given by 

=^^pTm~ (p^^ + " ^^'^ 



((1-P/)^ (l-(l-P^)^))Af^. 



The payoff of a node choosing deviation strategy cr'' while 
other nodes follow a"^ is given by 

U{<^';an^^^^^j^^^{L + Pr.M). 

By Definition [T] cr'" is DP against cr'' if and only if 

Uia-;a-)>Uia'';an. (2) 

The following theorem provides a necessary and sufficient 
condition for a review strategy to be DP against a"^. 

Theorem 1: Given pd £ (Pc,l], protocol a^{R, L, AI) is 
DP against cr'' if and only if g{R,L;pd) > and M > 
M,nin{R,L;pd), where 

5(P, L;pd) ^ (1 - P/(P, P))^ - (1 - Pc) (1 - P/(P, P)) 
- PdPm{R,L;Pd) (3) 

and 



M^in{R, L; Pd, 



A jPd -Pc)L 
g{R,L;pd)' 



Proof: Note that the net payoff gain from deviating to the 
deviation strategy cr'' is given by 

P(cr'';cr'-) - P((7'';(7'') 

{l-Pc^-' 



{pd-Pc)L~ g{R, L;pd)M^ . 



L + M V"^" (4) 
The first term in (HI is the gain during a review phase while 
the second term is the loss during a reciprocation phase. 
By Q, is DP against cr'' if and only if {pd ~ Pc)L < 
g{R, L;pd)M. It is easy to check that g{R,L;pd) > and 
M > M^iniR,L;pd) imply {pd - Pc)L < g{R,L-pd)M. 
Suppose that {pd-pc)L < g{R,L;pd)M. Since {pd-pc)L > 
0, we must have g{R,L;pd) > 0, which in turn implies 
M > M^iniR,L;pd). m 
Theorem 1 shows that for a given statistical test P, we 
can construct a DP protocol based on the test if and only if 
there exists a natural number P such that g{R,L;pd) > 0. 
Once we find such P, we can use it as the length of a 
review phase and then choose a natural number M satisfying 
M > AIjnin{R, L;pd) to determine the length of a recipro- 
cation phase. An immediate consequence of Theorem 1 is 
that if protocol {R, L, M) is DP against cr'', then protocol 
a'^{R,L,M') with M' > M is also DP against cr''. Thus, 
Mynin{R, L; Pd) can be interpreted as the minimum length of a 



reciprocation phase to make a^{R, L, M) DP against cr''. The 
following result provides a sufficient condition on R under 
which we can find L such that g{R, L;pd) > and thus a DP 
protocol based on R can be constructed. 

Corollary 1: Given pd G {pc,l\, suppose that R satisfies 
limi^oo Pf{R, P) = and limL^cx) Pm{R, L;pd) = 0. Then 
there exists P such that g{R,L;pd) > 0. 

Proof: By OJ, Htjil^oc Pf{R, L) and 

liniL^oo P,n{R, L;pd) = imply that lim^^oo g{R, L; pd) = 
Pc > 0. Thus, g{R, L;pd) > for sufficiently large P. ■ 

Combining Theorem 1 and Corollary 1, we can see that 
if test R is "asymptotically perfect" in the sense that the 
two probabilities of errors converge to zero as the test is 
performed using more signals, then we can always design a 
review strategy based on R that is DP against cr''. 

2) Near-Optimal Deviation-Proof Protocols: Suppose that 
every node follows a review strategy cr*". Since signals provide 
only imperfect information about the transmission probabil- 
ities of other nodes, it is possible that a punishment is 
triggered, which results in an efficiency loss as confirmed in 
the following proposition. We use to denote the set of all 
review strategies with private signals. 

Proposition 1: C(cr'') > for all cr'' G Sr (with equality if 
and only if Pf — 0). 

Proof: Fix a protocol cr'"(P, P, M) e S^. By (HJ, we can 
express the efficiency loss of ct'' as 



C(a'-) 



NM 



N-l 



-(1 -Pc) 

P-fAf^ ^' 

(pcPf-il^PfY 



a-Pf) 



(5) 



Since (1 -P; 



N 



is concave, we have (1 — P 



< 1 



-^Pf for Pf e [0, 1], with equahty if and only if Pf = 0. 
Using Pc — 1/N, we obtain the result. ■ 
Proposition 1 says that there is always a positive efficiency 
loss resulting from a review strategy unless there is a perfect 
statistical test in the sense that punishment is never triggered 
when every node follows cr'" (i.e., Pf = 0). Punishment results 
in an efficiency loss because the system payoff is the same 
as Nu^'~' when there is only one punishing node while it is 
zero when there are two or more. Hence, a longer punishment 
induces a larger efficiency loss. As can be seen from (|5]l, 
for given R and P, C{a^) is non-decreasing (and increasing 
if Pf > 0) in M. Therefore, if we find (P, P) such that 
g{R,L;pd) > 0, choosing AI = \ Mn^iniR, L; pd)~\ minimizes 
the efficiency loss while having {R, L, AI) DP against ct'^, 
where [•] denotes the ceiling function. This observation allows 
us to reduce the design choice from (P, P, AI) to (P, P). The 
following proposition provides a sufficient condition on the 
statistical test for constructing a near-optimal DP protocol. 

Proposition 2: Given pd G {pc, 1], suppose that P satisfies 
liniL^oo Pf{R, P) = and \im.L^^ P,n{R, L;Pd) = 0. Then 
for any (5 > 0, there exist P and M such that <t^{R, P, AI) is 
DP against cr'' and d-PO. 

Proof: Since limL^oo 9{R, L;pd) — Pc > 0, there exists 
Pi such that g{R,L;pd) > for all L > Li. By Theorem 1, 
a'^{R,L, [Mmi„(P,L;Pd)l) is DP against cr'' for all L > Pi. 
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Since C{(j^) is non-decreasing in M, we have 

N{M^in{R,L;pd) + l 



< C(a'') < 



p,P/-(l-P/)^ 



(1-^/) 



N~l 



(6) 



Note that MmL^oc Mmin{R,L;Pd)/L = {pd - Pc)/Pc, and 
thus the right-hand side of ^ converges to zero as L goes to 
infinity, which impHes liniL^oo C{a'') = 0. Therefore, there 
exists L2 such that C{a'^) < S for all L > L2- Choose L > 
niax{_Li, L2} and M — \M^in(R, L; p^)] to obtain a protocol 
with the desired properties. ■ 

Proposition 2 shows that the efficiency loss of a DP protocol 
can be made arbitrarily small when there is an asymptotically 
perfect statistical test. It also points out a trade-off between 
optimality and implementation cost. In order to make the 
efficiency loss within a small desired level, L should be chosen 
sufficiently large, which requires large M by the relationship 
M = \ M^in{R, L; pd)^ . At the same time, as L and M 
become larger, each node needs to maintain longer memory to 
execute a review strategy, which can be considered as higher 
implementation cost. 

We make a couple of remarks. First, the constructed DP 
protocols are DP against multiple nodes deviating to a"^. 
The payoff gain from deviation decreases with the number 
of deviating nodes. Hence, if a protocol can deter a single 
node from deviating to cr'^, it can also deter multiple nodes 
from doing so. Second, the constructed DP protocols are 
DP against a more general class of deviation strategies with 
which a permanent deviation to pd occurs in an arbitrary slot 
(determined deterministically or randomly). A deviating node 
cannot gain starting from a review phase after a deviation 
occurs, and without discounting its temporary gain is smaller 
than the perpetual loss. 

IV. Protocols Based on the ACK Ratio Test 

A. Description of the ACK Signal Structure and Protocols 
Based on the ACK Ratio Test 

In this section, we illustrate the results in Section III 
by considering a particular signal structure and a particular 
statistical test. In the slotted Aloha protocol in 1241 . a node 
receives an acknowledgement (ACK) signal if it transmits its 
packet successfully and no signal otherwise. In the ACK signal 
structure, the signal space can be written as Z.i = {S, F}, for 
all i G Af, where Zi — S means that node i receives an ACK 
signal and F means that it does not. We assume that there 
is no error in the transmission and reception of ACK signals. 
The signal distribution Q is such that Q{a) puts probability 
mass 1 on z £ Z with Zi — S and Zj = F for all j ^ i 
if ai ~ T and aj — W for all j ^ i, for each i G Af, and 
probability mass 1 on = F for all i otherwise. Since it is 
possible for nodes to receive different signals (only one node 
receives signal S when a success occurs), ACK signals are 
private. 

In a review strategy with the ACK signal structure, a node 
uses its ACK signals collected in a review phase to perform a 
statistical test. We propose a particular statistical test called the 
ACK ratio test. The test statistic of the ACK ratio test is the 




Review 
^Phase 



V Reciprocation 

^'Piiase 



Fig. 2. Automaton representation of a review strategy based on the ACK 
ratio test with parameters satisfying 1 < L{qc — B) < 2. 



ratio of the number of ACK signals obtained in a review phase 
to the length of a review phase, i.e., xi^l^^ ~ S}/L, 

where x is an indicator function and r + 1 represents a slot 
when a review phase begins. The test is passed if the statistic 
exceeds a threshold value, Qc — B, where Qc = Pc{l —Pc)^^^ 
and B G {0,qc), and fails otherwise. Note that Qc is the 
expected value of the ACK ratio when every node transmits 
with probability pc- If there is a deviating node, the ACK ratio 
tends to be smaller because its expected value is reduced to 
Id = Pc(l ^Pc)^~^(l —Pd)- The ACK ratio test is designed to 
distinguish between these two events statistically while having 
B as a "margin of error." Since the ACK ratio test can be 
identified with B, we use B instead of R to represent the 
ACK ratio test. 

A review strategy based on the ACK ratio test, a'' {B,L, M), 
can be represented formally as follows: 



Pc 

1, 



t G [liL- 
t G [1{L- 



M) 

-M) 

-M) 
■ik=l(L+M) 

t G [1{L + M) 



Y^l(L+M) + L 



-1 x{4 - 



- M) + L], 

{1 + 1){L + M)], 

- 5"}/^ <qc~B 
{1 + 1){L + M)], 



ESr^MHi X{z^ = S}/L >q^-B 



for 1 = 0,1,.... Fig. |2] shows an automaton representation of 
the review strategy a'' for 1 < L{qc ~ B) < 2 so that a node 
triggers punishment if it obtains less than two successes in 
a review phase. Each state transition is labeled by the set of 
signals that induce the transition. In a reciprocation phase, a 
node goes through either states PI to PAI (punishment phase) 
or states CI to CAI (cooperation phase) depending on the 
number of ACK signals obtained in the review phase. Note 
that the number of states in the automaton representation of 
protocol (7''{B,L,M) is given by Ns{a'') = kL - k{k - 
l)/2 + 2M, where fc > 2 is the natural number satisfying 
k-2< L{qc - B) <k-l. 
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B. Analytical Results 

Let F{y;n,p) be the cumulative distribution function of a 
binomial random variable with total number of trials n and 
probability of success p, i.e., 



F(y;n,p) 



Ivi , 

E'" 



p"'(i - pY 



where [ J denotes the floor function. Suppose that every node 
transmits with probability pc in a review phase. Then, the 
number of ACK signals that a node receives in the review 
phase follows a binomial distribution with parameters L and 
Qc. Thus, the probability that a punishment is triggered by 
node i is given by 



Pr 



' L 



Xl^r - S}/L <qc-B}= F{L{q, - B);L, gj, 



and the false punishment probability is given by 

Ff{B, L) = 1 - [1 - F{Liq, - B): L, q,)] 



N 



Suppose that there is exactly one deviating node using ct"^, i.e., 
transmitting with probability pd in a review phase. Then, the 
success probability in the binomial distribution changes to q^, 
and thus the miss detection probability is given by 



PrniB,L;pd) - [1- F{L{q,^ B);L,qd)] 



N~l 



The monotonicity of Pf and P,n with respect to the test 
parameter B is readily obtained. 

Proposition 3: Given pd G (Pc,l] and L, Pf{B,L) and 
Pm{B, L;pd) are non-increasing and non-decreasing in B G 
(0,(?c), respectively. 

Proof: The proof is straightforward by noting that 
F{L{qc - B);L,qc) and F{L{qc - B);L,qd) are non- 
increasing in B G (0, (7c)- ■ 

As the margin of error is larger, it is more likely that the 
test is passed, yielding a smaller false punishment probability 
and a larger miss detection probability. The following lemma 
examines the asymptotic properties of Pf and P„i as L 
becomes large. 

Lemma 1: Given pd £ (Pc,l], liniL^oo ^'/(S, i) — 
for all B e (0,qc), VrnvL^^o Pm.{B,L]pd) = for all 
B e (0,gc - qd), and Yivcll^oo Pm{B,L]pd) = 1 for all 
B e (qc- qd,qc)- 

Proof: Since x{zj^'' — S}, for k = 1, . . . ,i, can be 
considered as L i.i.d. random variables, we can apply the 
strong law of large numbers to the ACK ratio f25\. When every 
node transmits with probability pc, the ACK ratio converges 
almost surely to qc as L goes to infinity, which implies that 
the false punishment probability goes to zero for all B > 0. 
When there is exactly one node transmitting with probability 
Pd, the ACK ratio of a node transmitting with probability pc 
converges almost surely to qd as L goes to infinity. Hence, if 
qd < qc — B (resp. qd > qc — B), the miss detection probability 
goes to zero (resp. one). ■ 

Lemma 1 provides a sufficient condition on the ACK ratio 
test to apply Proposition 2. 



Proposition 4: Suppose that B G (0, qc — qd)- For any 5 > 
0, there exist L and M such that <t^ {B, L, M) is DP against 
ct'^ and S-PO. 

Proof: The proposition follows from Lemma 1 and Propo- 
sition 2. ■ 

Proposition 4 states that for given pd G (pc,l], we can 
construct a protocol cr'" that is DP against a'^ and achieves an 
arbitrarily small PoS by setting B such that < B < qc — qd ^ 
Pc{l — Pc)^ '^{pd — Pc)- Note that as pd is larger, it is easier 
to detect a deviation, and thus we have a wider range of B 
that renders deviation-proofness and near-optimality. 

So far we have considered a constant deviation strategy a'^ 
prescribing a fixed deviation probability pd and designed a 
protocol that is DP against a'^. However, it is natural to regard 
Pd as a choice by a deviating node, and thus in principle it 
can be any probability. Now we allow the possibility that a 
deviating node can use any constant deviation strategy, and we 
obtain the following result. 

Theorem 2: For any e > and (5 > 0, there exist B, L, and 
M such that a''{B,L,M) is robust e-DP and S-PO- 

Proof: The proof is relegated to Appendix A. ■ 

We can interpret e and S as performance requirements. Re- 
quiring smaller e makes protocols more robust while requiring 
smaller 6 results in a higher system payoff. In addition to the 
trade-off between optimality and implementation cost already 
mentioned following Proposition 2, we can identify a similar 
trade-off between robustness and implementation cost in that 
smaller e in general requires larger L and M to construct a 
robust e-DP protocol. 

C. Numerical Results 

To provide numerical results, we consider a network with 
5 nodes, i.e., N ^ b snA pc = l/N ^ 0.2. Fig. |3] plots the 
false punishment probability Pf{B, L) and the miss detection 
probability P,n{B, L;pd) while varying the length of a review 
phase L. Fig. 3(a) shows that Pf{B,L) exhibits a decreasing 
tendency as L increases, with discontinuities occurring at the 
points where the floor function of L{qc — B) has a jump. 
We can also see that Pf is smaller for larger B, as shown in 
Proposition 3. The upper threshold for the parameter B to yield 
limL_).oo Pra{R,L;Pd) = in Lemma I is qc - qd ^ 0.0512 
for Pd — 0.7. We can see from Fig. 3(b) that P,n{B,L;pd) 
approaches as L becomes large when B is smaller than 
this threshold, whereas it approaches 1 when B exceeds the 
threshold. Fig. 3(b) also shows that, for fixed B, Pm{B, L;pd) 
is smaller for larger p^, i.e., as the deviation becomes greedier, 
it is more likely to be detected. 

Fig. |4] plots the relationship between the length of a review 
phase L and the minimum length of a reciprocation phase 
\ Maiin{B , L; pd)~\ to have a DP protocol for different values 
of B and pd- In Fig. 4(a), we fix pd = 0.7 and consider 
B = 0.04 and 0.06. Note that when B 0.06, some values 
of L result in large minimum values of M, which are not 
displayed in Fig. 4(a). Also, the values of L with which no 
DP protocol can be constructed for given B and pd (i.e., 
g{B,L;pd) < 0) ai-e indicated with \ M^iniB , L; pd)'] = 
in Fig. 4(a). For example, we cannot construct a DP protocol 
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B . 0.04 
B . 0.06 




(a) 

Fig. 3. Pf{B,L) and Pm{B , L\p^) versus the length of a review phase L. 



. B = 0.04, = 0.7 
. B = 0.06, = 0.7 
, B . 0.06, p = 0.85 




(b) 



a. 



B = 0.04 
B = 0.06 





(a) (b) 
Fig. 4. The minimum length of a reciprocation phase \M^i^{B , L\p^)~\ versus the length of a review phase L: (a) = 0.7, and (b) B = 0.04. 
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o 



= 0.04 
= 0.06 



*••.... °°=«.= 



O 









. P, = 0.6 
. p_j = 0.85 












■■■ 







L L 

(a) (b) 
Fig. 5. Efficiency loss C((t'") versus the length of a review phase L: (a) p^ = 0.7, and (b) B = 0.04. 



using L such that 42 < L < 45 or 84 < L < 91 
when B = 0.06 and pd = 0.7. When B = 0.04, we can 
construct a DP protocol using any L > 10. In Fig. 4(b), 
we fix i? = 0.04 and consider pd — 0.7 and 0.85. For the 
considered values of pd, we observe that the minimum length 
of a reciprocation phase is increasing in pd for the most values 
of L. Also, in general, a longer review phase requires a longer 
reciprocation phase for fixed pd although a reverse relationship 
may be obtained, especially when L is small. Note that L and 
\Mrain{B, L;pd)~\ have a linear relationship in the limit since 
limL^ocM^iniB,L;pd)/L = {pd -Pc)/Pc- 



Fig. |5] plots efficiency loss C{a^) against the length of a 
review phase L when the length of a reciprocation phase is 
chosen as \Mjnin{B, L;pd)~\ for different values of B and pd- 
The points where efficiency loss is shown as in Fig. 5(a) 
are where no DP protocol exists for the given parameters. 
We can observe that as L increases, efficiency loss tends to 
decrease to 0, which is consistent with Proposition 4. Fig. 
5(a) shows that for fixed pd — 0.7, efficiency loss is smaller 
when B — 0.06 than when B = 0.04. This is because the 
false punishment probability of the former case is smaller than 
that of the latter case as shown in Fig. 3(a). Fig. 5(b) shows 
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TABLE II 

Parameters and the Efficiency Loss of Optimal Protocols 



Pd 


0.6 


0.65 


0.7 


0.75 


0.8 


0.85 


0.9 


0.95 


1 


(L, M) 


(22,101) 


(23,101) 


(23,94) 


(23,91) 


(23,90) 


(23,92) 


(23,96) 


(23,102) 


(22,106) 




0.0570 


0.0490 


0.0483 


0.0480 


0.0479 


0.0481 


0.0485 


0.0490 


0.0575 



that efficiency loss is almost the same for the two considered 
deviation probabilities when B — 0.04. 

D. Deviation-Proof Protocols with Complexity Considerations 

1 ) Protocol Design Problem with a Complexity Constraint: 
So far we have explored the possibility of constructing near- 
optimal deviation-proof protocols based on a review strategy. 
We mention briefly how to incorporate complexity consid- 
erations in the protocol design problem. One approach to 
measure the complexity of a repeated game strategy is to 
use the number of the states of the smallest automaton that 
can implement the strategy f26l. Thus, we can formulate the 
following protocol design problem, assuming that the deviation 
strategy is fixed as a'^. 

minimize C {a'' (B , L , M)) 

subject to a"^ is DP against a'^ (7) 

The second constraint can be interpreted as a complexity 
constraint which bounds the number of states in the automaton 
representation of . Without a complexity constraint, effi- 
ciency loss can be made arbitrarily small while satisfying the 
first constraint by choosing sufficiently large L, as shown in 
Proposition 4. Thus, the second constraint prevents L from 
growing without bound. 

2) Protocol Design Method: We propose a method to find 
an optimal protocol that solves the protocol design problem 
O- 

> Step 1. Determine a finite set B C (0,qc) as the set of 
possible values of B. 

• Step 2. Fix B ^ B. Identify the set of feasible {L, M) in 
the sense that {L,M) satisfies the second constraint of 
(|7]l given B. 

• Step 3. Fix feasible L, and check whether g{B,L;pd) in 
((3) is positive. If so, choose M as the smallest feasible 
value of M larger than or equal to M-^i-^[B^L\pi]), 
which we denote by M{B, L), if such a value exists. 
Then, a^{B, L, M{B, L)) is a protocol that satisfies both 
constraints of (|7]i. 

• Step 4. By varying B and L, obtain protocols that 
satisfy both constraints. Among these protocols, choose 
a protocol that yields the smallest efficiency loss. 

As an illustrative example, we consider = 5 and set 
Ns = 2^ — 256 so that protocols can be implemented using 8- 
bit memory. For simplicity, we fix B at 0.04, i.e., B — {0.04}. 
Table Ullpresents the parameters (L, M) and the efficiency loss 
of optimal protocols for different deviation probabilities. We 
can see that the optimal protocols have different parameters 
for different values of pd- Due to jumps in the efficiency 
loss curves as shown in Fig. |5] the optimal protocols do not 
necessarily have the longest possible review phase. 



V. Deviation-Proof Protocols When Signals are 
Public 

A. Motivation 

As mentioned in Section IIII-AI when signals are private, 
nodes do not know the results of the test that other nodes 
perform. Hence, nodes need to have a reciprocation phase 
regardless of the results of the test in order to synchronize 
the beginning of a review phase across nodes. However, this 
structure of a review strategy creates a weakness that can be 
exploited by a deviating node. A deviating node can cooperate 
in a review phase to avoid punishment and then defect in a 
reciprocation phase to obtain a payoff gain. To exclude such 
a "smart" deviation, in Sections III and IV we have focused 
on constant deviation strategies when designing DP protocols. 
However, this complication does not arise when signals are 
public. Since the result of the test is commonly known among 
nodes, a reciprocation phase can be skipped when the test 
is passed, eliminating the room for exploitation. This added 
robustness of protocols with public signals can be regarded 
as the value of public signals when the signal structure is a 
design choice. 

B. Description of Protocols with Public Signals 

When signals are public, nodes receive a common signal, 
and thus we use z*, without subscript i, to denote the signal 
in slot t. A review strategy with public signals is the same 
as the one described in Section IIII-AI except that there is 
no cooperation phase. That is, a new review phase begins 
immediately if the statistical test is passed. If the test fails, 
a punishment phase occurs as before. Since we focus on 
symmetric protocols, all nodes use the same statistical test and 
perform the test based on the same signals. Hence, all nodes 
obtain the same result of the test, and thus they are always 
in the same phase. We use d-^{R, L, M) to denote the review 
strategy with public signals that uses test R and has L and 
M as the lengths of a review phase and a punishment phase, 
respectively. 

C. Analysis of Protocols with Public Signals 

We first consider a fixed deviation strategy a'^ that has the 
same structure as the prescribed review strategy . That is, a 
deviating node transmits with probability pd in a review phase 
and with pr in a punishment phase. Since no node obtains a 
positive payoff in a punishment phase, the choice of pr does 
not affect the analysis, and thus for analysis only pd matters. 
For the same reason as in Section III, we focus on the case 
where pd > Pc- 

As in the case of private signals, we can compute two prob- 
abilities of errors: the false punishment probability Pf{R,L) 
and the miss detection probability Pm{R,L;pd)- Since a 
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punishment phase occurs with probability Pf and resuhs in 
zero payoff for every node when all nodes follow a review 
strategy, we have 

L + PfM 

Note that (L+PfM) is the average length of an epoch, defined 
as a review phase and the following punishment phase if one 
exists, and Lqc is the accumulated expected payoff for a node 
in an epoch. The payoff of a node choosing deviation strategy 
a'^ while other nodes follow is given by 

uia';a^) = . 

L+{1- P,n)M 

The efficiency loss of a'' can be computed as 



C(ct'') 



NPfMqc 
L + PfM' 



(8) 



which is always nonnegative (positive if Pf > 0). Note that the 
nonnegativity of the efficiency loss does not require pc — 1/N, 
unlike in the case of private signals (see Proposition 1). The 
following theorem is an analogue of Theorem 1 for the case 
of public signals. 

Theorem 3: Given pd £ (Pc,l], protocol a^{R, L, M) is 
DP against a'^ if and only if g{R,L;pd) > and M > 
Mnun{R,L;pd), where 



g{R,L;pd)=Pc{l-PML;Pd)) ~PdPf{R,L) 



and 



Mmin{R, L; Pd) 



A (Pd -Pc)L 



){R,L;pd)' 

Proof: U{a'';a'') > U{a'^;a'') if and only if {pd - 
Pc)L < ~g{R,L;pd)M. Note that (1 - Pc)^-^{pd - Pc)L 
is the gain from deviation in a review phase while (1 — 
Pc)^~^9{R, L;pd)M is the expected loss from deviation in 
a punishment phase. The result can be obtained by using a 
similar argument as in the proof of Theorem 1 . ■ 

Theorem 3 shows that for a given statistical test R, we can 
construct a DP protocol based on the test if and only if there 
exists a natural number L such that g{R, L;pd) > 0. Once we 
find such L, we can use it as the length of a review phase 
and then choose M larger than or equal to Mn^in{R, L; pd) 
to determine the length of a punishment phase. Since C{(j^) 
is non-decreasing in M for fixed R and L as can be seen 
from dH), the efficiency loss is minimized for given {R, L) 
by setting M = \ M^in{R, L; pd)'\ so that the length of a 
punishment phase is just enough to prevent deviation. Again, 
this observation reduces the design choices for a review 
strategy from {R,L,M) to {R,L). The next result is an 
analogue of Proposition 2, showing that if an asymptotically 
perfect statistical test is available, we can construct a near- 
optimal DP protocol. 

Proposition 5: Given pd £ {pc, 1], suppose that R satisfies 
limL_>oo Pf{R, L) = and MmL^oo Pni{R, L;Pd) = 0. Then 
for any 5 > 0, there exist L and M such that (t^{R, L, M) is 
DP against a'^ and 6-PO. 

Proof: The proof is similar to that of Proposition 2, and 
thus is omitted for brevity. ■ 




Review 
"Phase 



Punishment 
Phase 



Fig. 6. Automaton representation of a review strategy based on tlie idle slot 
ratio test witli parameters satisfying 1 < L{qc ~ B) < 2. 



VI. Protocols Based on the Idle Slot Ratio Test 

A. Description of the Ternary Signal Structure and Protocols 
Based on the Idle Slot Ratio Test 

To illustrate the results in Section V, we consider the ternary 
signal structure as in ll27l . |28l, whose signal space can be 
written as Z = {0,1, e}. Nodes receive signal if the slot 
is idle, 1 if there is a success, and e if there is a collision. 
Signals under the ternary signal structure are public because 
nodes always receive a common signal. We consider a review 
strategy with which nodes use the fraction of idle slots in a 
review phase, or the idle slot ratio, as the test statistics. If 
every node transmits with probability pc, the expected value 
of the idle slot ratio is qc — (1 ~ Pc)^ ■ On the other hand, 
if there is exactly one deviating node that transmits with 
probability pd during a review phase, the expected value is 



\N-1 



The idle slot ratio test 



reduced to = (1 - pd){l - p^ 
is passed if the idle slot ratio, J2k=i xi^'^^'' — 0}/L, exceeds 
a threshold value, (jc — B, and fails otherwise. Fig. |6] shows 
an automaton representation of a review strategy tj'" whose 
parameters satisfying 1 < L{qc — B) < 2. State transition 
occurs depending on the received signals, as depicted in Fig. |6] 
When a review phase ends, nodes either start a new review 
phase or move to a punishment phase depending on whether 
the number of idle slots in the review phase exceeds L{qc — B) 
or not. 



B. Analytical Results 

Suppose that every node follows a review strategy based 
on the idle slot ratio test, d-^{B,L,M). Then, every node 
transmits with probability pc in a review phase, and the 
number of idle slots occurring in a review phase follows a 
binomial distribution with parameters L and qc- Thus, the false 
punishment probability is given by 

PfiB,L)^FiL{qc-B);L,qc). 
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Since a deviating node using txansmission probability pd 
changes the "success probability" of the binomial distribution 
from Qc to Qd, the miss detection probability is given by 

P™(B, L; Pd) = l- F{L{q, - B)- L, qd). 

The monotonicity of Pf and P,„ with respect to the margin 
of error B is stated as follows. 

Proposition 6: Given pd G (pc,l] and L, Pf{B,L) and 
Pm{B, L;pd) are non-increasing and non-decreasing in _B G 
(0, Qc), respectively. 

Proof: The proof is straightforward by noting that 
F{L{qc-B);L,qc) and F{L{qc-B); L,qd) is non-increasing 
in S e (0, qc). m 

The next lemma examines the asymptotic properties of Pf 
and Pm as L becomes large. 

Lemma 2: Given pd G (pc,l], liniL^oo ^/(S, i) — 
for all B e (0,gc), VmYL-,oo Prn{B,L]pd) = for all 
B € (0,(?c - qd), and MmL^oo Prn{B, L;pd) = 1 for all 
B e {qc- qd, qc)- 

Proof: We use the same approach as in the proof of 
Lemma 1. When every node transmits with probability pc, 
the idle slot ratio converges almost surely to qc as L goes to 
infinity, which implies that the false punishment probability 
goes to zero for all B > 0. When there is exactly one node 
transmitting with probability pd, the idle slot ratio converges 
almost surely to qd as L goes to infinity. Hence, if qd < qc — B 
(resp. qd > qc — B), the miss detection probability goes to zero 
(resp. one). ■ 

Lemma 2 gives a sufficient condition on the idle slot ratio 
test to apply Proposition 5. 

Proposition 7: Suppose that B G (0, qc — qd)- For any S > 
0, there exist L and M such that a'' {B, L, M) is DP against 
ct"^ and S-PO. 

Proof: The proposition follows from Lemma 2 and Propo- 
sition 5. ■ 

Proposition 7 states that for given pd G {pc, 1], we can 
always construct a protocol based on the idle slot ratio test that 
is DP against ct'' and achieves an arbitrarily small efficiency 
loss by choosing B such that Q < B < qc — qd — {Pd—Pc){^ — 
Pc)'^~^- As in the case of the ACK ratio test, we have a wider 
range of B that renders deviation-proofness as pd is larger 

We have considered deviation strategies that prescribe a 
constant transmission probability in a review phase. We now 
consider the case where a deviating node can use any strategy 
in Si, which includes strategies that adjust transmission proba- 
bilities depending on the signals obtained in the current review 
phase. The following theorem shows that we can construct a 
protocol based on the idle slot ratio test that is deviation-proof 
and near-optimal. 

Theorem 4: For any e > and S > 0, there exist B, L, and 
M such that L, M) is e-NE and S-PO. 

Proof: The proof is relegated to Appendix B. ■ 

The interpretation of e and 6 as performance requirements as 
well as the trade-off between performance and implementation 
cost, as discussed following Theorem 2, is still valid in the case 
of public signals. 

Remark: Protocols with sliding windows. Suppose that more 
than L{qc — B) idle slots have occurred before the end 



of a review phase. Then, a deviating node, knowing that 
a punishment will not occur regardless of the outcome in 
the remaining slots of the review phase, can increase its 
transmission probability for the remainder of the review phase 
to obtain a payoff gain. We can make a protocol based on a 
review strategy robust to such a manipulation by having sliding 
windows for review phases. In a review strategy with sliding 
windows, a review phase begins in each slot unless there is a 
new or ongoing punishment. Once the idle slot ratio test based 
on the recent L signals fails, a review stops and punishment 
occurs for AI slots. Once a punishment phase ends, a review 
phase begins in each slot until another punishment occurs. A 
detailed analysis of this protocol is left for future research. 

C. Numerical Results 

We provide numerical results to demonstrate the findings 
on DP protocols with public signals. Again, we consider a 
network with N — 5 and pc = l/N = 0.2 while varying pd 
and the protocol parameters. 

Fig.|7]plots Pf and Pm against the length of a review phase 
L for B = 0.1 and 0.25 when pd = 0.7. As in the case of 
private signals, Pf tends to decrease with L and approaches 
zero for large L. Also, Pf is smaller for a larger margin 
of error B. Note that the upper threshold for B to yield 
lim^^oo Pm{B, L;pd) = in Lemma 2 is qc — qd = 0.2048. 
We can see that when B is larger than this threshold, Pm 
tends to increases with L and approaches 1 for large L. On the 
contrary, when B is smaller than the threshold, P,„ approaches 
zero for large L, making the test asymptotically perfect. 

Fig. H] plots the minimum length of a reciprocation phase 
\ AIn-iin{B , L; pd)~\ to have a DP protocol as a function of 
the length of a review phase L. We can see that for fixed 
Pd = 0.7, a longer reciprocation phase is needed for larger 
B, except when L is small, and that DP protocols cannot 
be constructed with some small values of L when _B = 0.1 
(displayed as [Mi„in(-B, — 0). Also, when B = 0.1, 

a longer reciprocation phase is needed for pd — I than 
for Pd = 0.7. The efficiency loss of DP protocols with the 
minimum length of a reciprocation phase is shown in Fig. |9] 
We can see that larger B results in smaller efficiency loss, 
because Pf is smaller for larger B as shown in Proposition 
6. Also, efficiency loss approaches zero as L becomes large, 
which is consistent with Proposition 7. 

VII. Extension to a CSMA/CA Network with 
Selfish Nodes 

In this section, we discuss how the proposed protocols 
based on a review strategy can be modified for a CSMA/CA 
network. In |9|, the authors consider a CSMA/CA network 
in which a selfish node uses a fixed contention window size. 
They show a discrepancy between NE and Pareto optimum. 
The contention window size of each node at the unique PO 
outcome is denoted by W*, which results in a transmission 
probability pc = 2/{W* + 1). The optimal payoff u^*^, i.e., 
the throughput at Pareto optimum, can be computed using Eq. 
(1) of m, based on the model of ||29l. 
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B = 0.25 



B.0.1 
B.0.25 



(a) 

Fig. 7. Pf(B,L) and Pm{B, L;pj) versus the length of a review phase L when = 0.7. 



(b) 




Fig. 8. 



L L 

(a) (b) 
The minimum length of a punishment phase \M^i^{B, L;pd)] versus the length of a review phase L: (a) = 0.7, and (b) B = 0.1. 



O 



' B = 0.1 
■ B = 0.25 



■ I hi 



1,1 Ill 



■ Pd-"-^ 
• Pd=1 



L L 

(a) (b) 

Fig. 9. Efficiency loss C{a-^) versus the length of a review phase L: (a) = 0.7, and (b) B = 0.1. 



A review strategy for a CSMA/CA network can be described 
as follows, assuming private signals (i.e., sensing information 
is private). At the beginning, nodes are synchronized to start 
a review phase. In a review phase, which lasts for L time 
period, each node sets its window size at W* . After a review 
phase, each node computes its actual throughput, denoted by 
Ti, and compares it with u^*^, the expected throughput when 
no node has deviated from W* . A deviating node chooses 
its window size smaller than W* in order to increase its 
transmission probability from pc and thus to obtain a higher 
throughput. Since a deviation decreases the throughput of the 



well-behaved nodes, we can design a test such that the test 
performed by node i is passed if and only if > u^'^ — 
B for some constant B G (0, vF'^). If the test of node i is 
passed, node i moves to a cooperation phase during which it 
continues to set its window size at W* . Otherwise, it moves 
to a punishment phase during which it sets its window size at 
the minimum value 1 . A reciprocation phase lasts for M time 
period, and a new review phase begins after a reciprocation 
phase. 

As in a slotted Aloha network, ti converges almost surely 
to it^*-" as L goes to infinity, and thus the proposed test can be 
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made asymptotically perfect by choosing an appropriate value 
of B. Hence, when window sizes take discrete values, we can 
construct a protocol that is DP against any constant deviation 
strategy and achieves a small efficiency loss, following a 
similar approach to Theorem 2. We omit the details due to 
lack of space. 

VIII. Conclusion 

It is well-known that the decentralized operation of multiple 
access communication systems with selfish nodes often results 
in an inefficient use of a shared medium. To overcome this 
problem, we have proposed new classes of slotted MAC pro- 
tocols that are robust to selfish manipulation while achieving 
near-optimality. The proposed protocols are based on the idea 
of a review strategy in the theory of repeated games. With 
the proposed protocols, nodes perform a statistical test to 
determine whether a deviation has occurred and trigger a pun- 
ishment when they conclude so. We have provided conditions 
under which we can design deviation-proof protocols with a 
small efficiency loss and illustrated the results with particular 
statistical tests. Our framework and design methodology are 
not limited to multiple access communications. They can be 
applied to other networking and communication scenarios in 
which agents obtain imperfect signals about the decisions of 
other agents and a deviation influences the distribution of 
signals. 

Appendix A 
Proof of Theorem 2 

Choose arbitrary e > and S > 0. Define = 
Pc + — Pc)^~^- Note that is the minimum deviation 
probability with which a deviating node gains at least e in a 
slot when other nodes transmit with probability pc- Choose 
B e (0, e/{N - 1)). Note that qc - qd > e/{N - 1) for all 
e [pe, 1]. Define 



{l-Pf{B,L)y 
^ P,n{B,L;p,). 



(l-p,)(l-P/(i?,L)) 



Since Pm{B, L;pd) is non-increasing in pd, we have 
g{B,L;pd) > g{B,L) for all pd G [Pc,!], where 
g{B,L;pd) is defined in Q. Also, by Lemma 1, we have 
\imL~*oo Pf{B,L) = and Iiuil^oo P,n{B, L;p^) = 0. 
Therefore, limL^oo 9{B, L) — pc, and thus there exists Li 
such that g{B,L;pd) > for all pd G [Pe, 1], for all 
L > Li. Define M{L) = \{l - pc)L/g{B,L)]. Since 
M{L) > MminiB , L; Pd) for all pd G protocol 
a'^ {B , L , M (L)) is DP against all constant strategies using 
Pd e [pe, 1], for aU L > Li. 

Since C{a'^) is non-decreasing in M, we have 

N{{l~p,)L/g{B,L) + l) 



< CK(B,L,M(L))) < 
xil-Pcf-' 



L+{{l-p,)L/g{B,L) + 
p,Pf - (1 - Pf)"^ + {1 - Pf 



1) 



Therefore, limi^^oo 'Ps{<^^) — 0. and there exists L2 such that 
C((t'') < 5 for all L > L2- Choose L > max{Li, L2} and 
M = M{L). Then a''{B,L,M) is DP against all constant 



strategies using pd G [Pe,l] and satisfies C((j'') < 5. Finally, 
note that the payoff gain from deviating to a constant strategy 
using Pd G [OjPj) is bounded above by e. Hence, a^{B, L, M) 
is robust e-DP and 5-PO. This completes the proof. 

Appendix B 
Proof of Theorem 4 

Consider the problem of a deviating node maximizing its 
payoff given that all the other nodes use a review strategy 
(j^{B,L,AI), i.e., maxcreSi U{a\a'^'). We can define a state 
space with total L{L + l)/2 + M states, where a state is a 
pair consisting of the slot position and the number of idle slots 
since the beginning of the current review phase in the case of 
a review phase while it is the slot position in the case of a 
punishment phase. By the principle of dynamic programming, 
we can obtain a stationary optimal strategy, denoted by a*. 
Let pt be the expected value of the transmission probability 
of a node using a* in slot t of a review phase (conditional on 
null history) when other nodes follow tr'". Let It = x{z*^ — 0}- 
Since E{lt\ = (1 -pt)(l we have 



C/(cr*;a'') 



{\-pt){\-Pc) 



T + L 
t=T+l 



Pt 



L{l-Pc 



+ P*M 

\N-1 



E 



P*M 



(9) 



where r + 1 is the first slot of a review phase and P^ is the 
punishment probability when the deviating node uses a*, i.e., 

Pf = {Eli+i It < m - B)}. Since ^[=^,1 > 0, 
using Markov's inequality, we have 

> (1 - P})L{q, - B). 



E 



T + L 
t = T + l 



(10) 



Combining (|9]l and dTol l. we obtain 

L<Z, + P;(l-p,)^L 



U{a-a^)< 



(1 - P})BL 



L 



P*M 



(11) 



for all CT G Si. 

Choose arbitrary e > and 5 > 0. Following II20II . we relate 
the choice of M and B to L as follows: 



B = PLP- 
M = /ii, 



/3 > 0, \<P<1 



/i > 



Fix /3, p, and ^ such that /? > 0, 1/2 < p < 1, and ^ > N~l. 
By Chebychev's inequality, 

Wi^)<^4^ = %fc#. (12) 



Also, note that 



C(a'') 



^2^2p-l 



NPfMq, _ NPfnq, 



L + PfM 1 + Pffi 



(13) 



Since Pf{B, L) in (fT2b converges to zero as L goes to infinity, 
we can achieve an arbitrarily small efficiency loss in ( fT3] l by 
choosing sufficiently large L. In other words, for any S > 0, 
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there exists L'g such that C{a'~) < S for all L > Lg. With 
fi > N — 1, the upper bound on the deviation payoff in ( fTTT i 

+ -p,)^ + (1 - P})(3LP-^ 

is decreasing in PJ. Thus, the deviation payoff is bounded 
above by + I3LP^^ . 
Choose L such that 

L > max I^L's, ' ' | ■ 

Since L > L'j^^^^, we have 

qc~^<Uia--a-)<Uia*;a-). (14) 

Since L > (2/3/e)^/<^"''\ we have 

Uia*;d^)<qc + /SL^-^ <qc + e/2. (15) 

Then, by (fl4] l and ( fTSl l. we obtain an upper bound on the 
deviation gain as 

C/(cr*;a'') - [/(a''; a'') < e, 

which proves that {B, L, M) is an e-NE. Lastly, since L > 
L's, we have C(ct'') < 5, and thus a^'iB, L, M) is 5-PO. 
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